Extracting Concepts from Dynamic Legislative Text Collections

نویسندگان

  • Gaël Dias
  • Sara Madeira
  • Gabriel Pereira Lopes
چکیده

Selecting discriminating terms in order to represent the contents of texts is a critical problem for many applications in Information Retrieval. Most of the Information Retrieval systems index documents based on individual words that are not specific enough to evidence the contents of texts. As a consequence, there has been a growing interest in developing techniques for automatic term extraction. In this context, we propose a new architecture for retrieving relevant documents in a dynamic text collection. It combines the SINO search engine with the SENTA software designed for the automatic extraction of multiword lexemes. In this paper, we will particularly focus on the SENTA module that has recently been added to the global architecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Discovery in Textual Databases: A Concept-Association Mining Approach

Concepts are often related to short sequences of words that occur frequently together across the text collections. Such concepts convey much of the meaning in any language. Association rule mining is a powerful technique for extracting relations among concepts. From a text mining perspective, association rules have mainly been used in a traditional support-confidence framework. This approach su...

متن کامل

Perceptual knowledge construction from annotated image collections

This paper presents and evaluates new methods for extracting perceptual knowledge from collections of annotated images. The proposed methods include automatic techniques for constructing perceptual concepts by clustering the images based on visual and text feature descriptors, and for discovering perceptual relationships among the concepts based on descriptor similarity and statistics between t...

متن کامل

DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text

In this paper a knowledge base concept driven named entity recognition (NER) approach is presented. The technique is used for information extraction from news articles and linking it with background concepts in knowledge base. The work specifically focuses on extracting entity mentions from unstructured articles. The extraction of entity mentions from articles is based on the existing concepts ...

متن کامل

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

Analysis of unsupervised dimensionality reduction techniques

Domains such as text, images etc contain large amounts of redundancies and ambiguities among the attributes which result in considerable noise effects (i.e. the data is high dimension). Retrieving the data from high dimensional datasets is a big challenge. Dimensionality reduction techniques have been a successful avenue for automatically extracting the latent concepts by removing the noise and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002